Python Data Mining Quick Start Guide by Nathan Greeneltch
Author:Nathan Greeneltch [Nathan Greeneltch]
Language: eng
Format: epub
Tags: COM018000 - COMPUTERS / Data Processing, COM062000 - COMPUTERS / Data Modeling and Design, COM089000 - COMPUTERS / Data Visualization
Publisher: Packt
Published: 2019-04-24T11:20:04+00:00
PCA
PCA is used to reduce the dimensions of data in an unsupervised manner. The method's goal is to identify new feature vectors, maximize the variance in the data, and then project the original data into this new space. Please revisit the short example in the previous section for an intuitive description.
The new feature vectors that maximize variance are called eigenvectors, and are the principal components. There is one component for each original feature. The power of this method comes when you drop the less important ones and keep only those with the most informative content, thus lowering the dimensions. Scikit-learn has an explained_variance_ attribute that can be used to rank the importance of each principal component. More commonly in data mining, you will use the n_components arg to specify a new, lowered number of dimensions and allow scikit-learn to sort by variance and drop the features automatically.
In the following PCA example, the raw scatter plot of the iris dataset is on the left. The most variation is captured in the direction of the red arrow ("PCA1"), and the runner-up is the orthogonal direction that is captured by the black arrow ("PCA2"). Now imagine rotating the dataset so that the two axes are the first two principal components. Finally, study the PCA scatter plot on the right where the axes are the directions, "PCA1" and "PCA2":
The connection between the right and left scatters should be clear in your mind before you move on from this section. It's this kind of intuition that will allow you to do powerful analysis while also knowing what the underlying mathematics is doing. The methods in this book are not black boxes, and you should force yourself to learn and understand them. You almost certainly do yourself a disservice as a data mining practitioner otherwise.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
AI & Machine Learning | Bioinformatics |
Computer Simulation | Cybernetics |
Human-Computer Interaction | Information Theory |
Robotics | Systems Analysis & Design |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8261)
Test-Driven Development with Java by Alan Mellor(6397)
Data Augmentation with Python by Duc Haba(6297)
Principles of Data Fabric by Sonia Mezzetta(6074)
Hadoop in Practice by Alex Holmes(5942)
Learn Blender Simulations the Right Way by Stephen Pearson(5934)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(5820)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5787)
RPA Solution Architect's Handbook by Sachin Sahgal(5213)
Big Data Analysis with Python by Ivan Marin(5183)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5108)
The Infinite Retina by Robert Scoble Irena Cronin(4905)
Pretrain Vision and Large Language Models in Python by Emily Webber(4159)
Functional Programming in JavaScript by Mantyla Dan(4022)
The Age of Surveillance Capitalism by Shoshana Zuboff(3917)
Infrastructure as Code for Beginners by Russ McKendrick(3917)
WordPress Plugin Development Cookbook by Yannick Lefebvre(3622)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(3433)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3406)
